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One type of skin cancer that is considered a malignant tumor is melanoma. 
Such a dangerous disease can cause a lot of death in the world. The early 
detection of skin lesions becomes an important task in the diagnosis of skin 
cancer. Recently, a machine learning paradigm emerged known as deep 
learning (DL) utilized for skin lesions classification. However, in some 
previous studies by using seven class images diagnostic of skin lesions 
classification based on a single DL approach with CNNs architecture does not 
produce a satisfying performance. The DL approach allows the development 
of a medical image analysis system for improving performance, such as the 
deep convolutional neural networks (DCNNs) method. In this study, we 
propose an ensemble learning approach that combines three DCNNs 
architectures such as Inception V3, Inception ResNet V2 and DenseNet 201 
for improving the performance in terms of accuracy, sensitivity, specificity, 
precision, and Fl-score. Seven classes of dermoscopy image categories of 


skin lesions are utilized with 10015 dermoscopy images from well-known the 
HAM10000 dataset. The proposed model produces good classification 
performance with 97.23% accuracy, 90.12% sensitivity, 97.73% specificity, 
82.01% precision, and 85.01% Fl-Score. This method gives promising results 
in classifying skin lesions for cancer diagnosis. 
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1. INTRODUCTION 

Skin lesions are skin tissue that has an abnormal growth or appearance compared to the surrounding 
skin [1]. Some types of lesions can be potentially cancerous. One type of skin cancer that is considered a 
malignant tumor is melanoma [2]. Melanoma cancer causes many deaths in the world [3]. So it is very 
important to diagnose melanoma at an early stage so that patient survival can be improved [4]. Melanoma can 
be detected by medical diagnosis using digital imaging or called dermoscopy [5]. Dermoscopy is a non- 
invasive imaging technique that obtains an enlarged image of a skin lesion by the use of polarized light [6]. 
This technique visualizes features of skin pigmentation lesions that cannot be seen and assessed directly [7]. 
Although this method increases the accuracy of diagnosis, the process is very complex and error-prone [5]. 
Therefore, medical diagnosis using digital imaging with a computerized system is automatically needed in 
decision making. 
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The classification of dermoscopic images of skin lesions has been studied for a long time in much of 
the literature. Research on the classification of skin lesions through dermoscopy images have proposed by 
using various methods, from conventional methods such as traditional feature extraction techniques based on 
texture and color features [8], border-based texture analysis and wavelet decomposition [9], and rule-based 
image processing or segmentation algorithms [10]. Other classification methods by using machine learning 
(ML) such as, support vector machine (SVM) [11], decision tree [12], and neural network [13]. However, 
these methods are still not optimal because the classification process of dermoscopy images requires feature 
extraction and additional processing from the dataset where the model parameters are not directly reduced 
[14]. At present, there is a ML method developed can overcome the feature extraction problem with a 
different structure-based feature learning. In such an approach, the structure of learning that is carried out in- 
depth to recognize the image, named deep learning (DL). 

In recent years there are many studies utilize the DL approach for medical applications [15]-[17]. 
The advantage of such a method is a feature learning model that is used automatically to handle large data 
sets. Convolutional neural networks (CNNs) method is one of the DL methods that is considered to have the 
best architecture in several applications for image classification. Especially in a medical application, such 
CNNs approach with many architectures indicates good performance in the classification, segmentation and, 
detection task of medical images [16]. CNNs have a significant conceptual framework including weight 
sharing, local perception area, and down sampling space. In this method displacement, distortion and scaling 
characteristics are relatively unchanged [18]. Specifically, in the classification of skin lesions to detect skin 
cancer, some researchers have applied single CNNs with good results [19]-[21]. However, the previous 
research based on CNNs method to classify skin lesions with limited class, only two or three classes of skin 
lesions. There are many categories of dermoscopy images to diagnose. If the single CNNs architecture used 
for several skin lesions classification, the classifier performance is decreased. Hence, the classification of 
skin lesions uses several classes with good performance in terms of accuracy, sensitivity, specificity, 
precision, and F1-Score is desirable. 

In this paper, we propose a deep convolutional neural network approach to classify several 
categories of diagnosis of skin lesions. To improve the classification performance, we elaborate based on the 
ensembles of three CNN architectures such as, Inception V3, Inception ResNet V2, and DenseNet 201. This 
paper is organized as follows. In section 2, we provide a brief description of the dataset, pre-processing data, 
and CNN classifiers with an ensemble model using three superior CNNs architectures by using combination 
Inception V3, Inception ResNet V2, and DenseNet 201. In section 3, we present the results and analysis of 
the experiment. Finally, in section 4 we draw some conclusions. 


2. MATERIAL AND METHODS 

This paper proposes a new approach to classifying skin lesions into seven different classes. We use 
the ensemble model by combining three CNNs architectures such as Inception ResNet V2, Inception V3, and 
DenseNet-201. This method consists of preprocessing data, ensembles of CNNs, and classifier performance 
evaluation based on performance metrics. 


2.1. Preprocessing data 

The dataset as input from the network system is a dermoscopy image of The HAM10000 dataset 
that is publicly available through the International Skin Imaging Collaboration (ISIC) 2018 archive [22]. This 
dataset is obtained from patients of various ages and genders. Dermoscopy image samples can be seen in 
Figure 1, which shows one sample in each diagnostic category. The dataset contains 10,015 dermoscopy 
images describing all-important diagnostic groups in the field of pigmented skin lesions, such as actinic 
keratoses and intraepithelial carcinoma/Bowen's disease (akiec, 327 images), basal cell carcinoma (bec, 514 
images), benign keratosis-like lesions (bkl, 1099 images), dermatofibroma (df, 115 images), melanoma (mel, 
1113 images), melanocytic nevi (nv, 6705 images), and vascular lesions (vasc, 142 images) as summarize in 
Table 1. 

From the dataset, the original image in JPEG format with 450x600 pixels which is too large, the 
image is resized to become 192x256 pixels. In this paper, the stratified method was applied to split the 
dataset into 8111 images for the training set, 902 images for the validation set, and 1002 images for the 
testing set. Then, the image is normalized by dividing by a value of 255. To increase the amount of training 
data without removing the essence of the data, a real-time data augmentation module was also added to our 
platform. The purpose of data augmentation to increase the number of skin images, and for reducing the 
overfitting of the network. The data augmentation method is rotation with an angle of 60°, shear with 
probability 0.2, zoom with probability 0.2, width shift with probability 0.2, and height shift with probability 
0.2. 
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Figure 1. Dermoscopy image samples with seven category diagnostic 


Table 1. Dermoscopy image data 


No Model Images number 

1 Actinic keratoses (akiec) 327 

2 Basal cell carcinoma (bec) 514 

3. Benign keratosis-like lesions (bkl) 1099 

4 Dermatofibroma (df) 115 

5 Melanoma (mel) 1113 

6 Melanocytic nevi (nv) 6705 

7 Vascular lesions (vasc) 142 
Total 10015 


2.2. Convolutional neural network (CNN) 

CNN is an artificial neural network using a grid-like structure designed for data processing such as 
images. A simple CNN architecture usually has four layers such as convolutional layer, rectifying linear unit 
(ReLU) layer, pooling layer, and fully connected layer [23]. CNNs have a hierarchical architecture, starting 
from the input signal x, each subsequent layer x; given by: 


x = pvp j-1 () 


where W; is a linear operator in convolution layer, and p is a rectifier max(x, 0) or sigmoid 1 / 1+ exp (—x)" 


The operator W; as a stack of convolutions of the previous layer and it defined (2), 


xj(u,kj) = pOejaG.k) * Wig OD) (U)) (2) 
Here * is the discrete convolution operator: 
(f* g)(K) = Luz -o f(u)g(x — u) (3) 


The problem of optimization described by a CNN is extremely non-convex. Using the 
backpropagation algorithm to calculate gradients, the weights W; are learned by stochastic gradient descent. 
The convolution layer is used to study features and identify classes from image datasets. The convolution 
operation with the ReLU activation function on CNNs is expressed, 


y' = max (0,y,k4 ®@ x! +d’) (4) 
where kij is the convolution kernel, bj is bias and ®@ indicates the convolution operation. The convolution 
operation is a matrix multiplication between the image input and the kernel where the output can be 
calculated by the dot product. The representation of CNNs architecture for skin lesion classification can be 


seen in Figure 2. 
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Figure 2. CNN general architecture 


CNNs are a DL architecture that is widely used to classify diseases from medical images [17]. In the 
previous studies, CNNs has proposed for the classification of skin lesions through dermoscopy images and 
produces superior performance [19], [24], [25]. This architecture is utilized in extracting features from 
dermoscopy images. In this paper, we elaborate on the performance of Inception V3 [26], Inception ResNet 
V2 [27] and DenseNet 201 [28] separately and ensemble models of the three architectures for the 
classification of 7 classes of skin lesions. We have modified the last fully connected layer on three 
architectures and replaced it with a new fully-connected layer (consisting of one global max-pooling layer, 
one fully connected layer with 512 neurons, one dropout layer with a probability of 0.5 and output layer with 
a SoftMax activation function for classifying 7 types of skin lesions). Inception V3 is an architecture based 
on the inception module. This architecture consists of 9 Inception modules with 22 convolutional layers. The 
Inception module has 3 different sizes for convolution layers with kernel filters (5x5, 3x3, and 1x1) and 
pooling layers with 3x3 filters [26]. In research [27], improvements have been introduced by releasing the 
Inception ResNet V2 architecture. Inception ResNet V2 architecture is a variation of the Inception V3 model. 
Along with Inception V3 and Inception ResNet V2, use DenseNet 201. Dense convolutional network 
(DenseNet) is an architecture that connects two layers with the same feature map size. DenseNets has many 
benefits, including reducing issues with vanishing gradients, enhancing the propagation of features, 
facilitating the reuse of features and greatly reducing the number of parameters [28]. All CNNs architecture 
becomes an ensemble learning architecture as described in Figure 3. 


The HAM 10000 
Dataset 
i 
Inception V3 Inception ResNet V2 DenseNet 201 
Model Model Model 


Probability Prediction 


Average Probability 


Final Prediction 


Figure 3. Proposed CNNs with ensemble learning by using three architecture Inception V3, Inception ResNet 
V2, and DenseNet 201 
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2.3. Training and testing 

Our model is implemented on the Python artificial intelligence framework. We use the ReLU 
activation function at the fully connected layer, SoftMax activation function at the output layer, loss function 
with categorical cross-entropy, Adam optimization algorithm, and the learning rate is initialized at 0.0001. 
We initialize the pre-trained weights on ImageNet for network parameters. In this paper, the approach used to 
extract features is to run images through a pre-trained network as a feature extractor. Then, fine-tuning aims 
to extract more specific features. At this stage, several experiments were carried out. The first experiment is 
fine-tuning for the Inception V3 model by training all layers. The second experiment was fine-tuning the 
Inception ResNet V2 model by training at the top layer. The third experiment was fine-tuning the DenseNet 
201 model by training on all layers. After training the individual CNN models, we made an ensemble by 
combining all three models to classify 7 types of skin lesions. This ensemble model combines and takes the 
average output probability of the three models. The CNNs model was implemented using CNNs with 
ensemble learning on a tesla NVIDIA GeForce RTX 2080 GPU and processor Intel(R) Core ™ version 9 
with 3.60 GHz processor clock rate. Processing each image took 9s and 147 ms at test time. 


3. EXPERIMENTAL RESULTS AND ANALYSIS 

For testing the proposed model, 1002 dermoscopy images are utilized. Before we conducted the 
ensemble model, nine CNNs architecture is developed to see the ability of CNNs model refer to Table 2. 
Table 2 shown the high performance is achieved by DenseNet 201 architecture outperformed other 
architecture, but the sensitivity value still below 90%, and over-fitting between training and testing has 
occurred. Therefore, to overcome the drawback, the ensemble architecture is conducted by using three 
architectures the Inception V3 model, the Inception ResNet V2 model, and the DenseNet 201. The three 
CNNs architecture with the best performance is combined to calculate the average value in terms of 
accuracy, sensitivity, specificity, precision, and F1 score. All value was obtained from the confusion matrix. 
The classification performance results are described in Table 3 and Figure 4 as a confusion matrix. 

The single DenseNet 201 with fine-tuning model gives very good results even though this model 
does not have many parameters such as Inception V3 and Inception ResNet V2. The performance of 
DenseNet 201 in this experiment shows that this model can be used to train different datasets. Using the 
ensemble learning approach, we made an ensemble of Inception V3, Inception ResNet V2 and DenseNet 201, 
which had been fine-tuned before, and showed the best classification results with an average accuracy of 
97.23%. The ensemble learning approach of these three architectures can improve the accuracy and 
prediction results by taking the average probability of output from the model. 

Furthermore, a comparative analysis was carried out in assessing the method proposed in this paper 
by involving the comparison of several studies in the multi-class classification of skin lesions. Table 4 is 
summarized the output of the proposed method approach and compared with the previous approach on the 
multi-class classification of skin lesions using dermoscopy images. As shown in Table 4, the proposed 
ensemble model produces better accuracy than other studies. The results of the performance are validated in 
the training, validation, and testing process to guarantee better performance because of a deeper architecture 
of the ensemble model. 

Unfortunately, this study has several limitations such as dermoscopy image number from dataset is 
imbalanced and the single CNN model still produces overfitting. We did not succeed in carrying out better 
training strategies that could help the single CNN model in achieving better results to decrease the 
overfitting. Through experimentation, we also know that the transfer learning approach does not provide 
good performance for this dataset because of the main differences in features between dermoscopic images 
and ImageNet. The deep investigation is needed for future work, in avoiding overfitting and examining 
ensemble techniques and other CNNs classification architectures in the task of classifying skin lesions. 


Table 2. Single CNNs architecture with fine-tuning 


Model Accuracy Sensitivity Specificity 
VGGI16 94.58% 71,53% 95,52% 
MobileNet 93,04% 0 94,44% 
ResNet50 95,03 % 73,54% 95,93% 
Inception V3 93,86% 67,49% 95,01% 
Fine Tuning Inception V3 96,03% 76,57% 81,08% 
Inception ResNet V2 95,86% 71,06% 77,90% 
Fine Tuning Inception ResNet V2 96,03 % 81,40% 96,32% 
DenseNet 201 95,86% 82,45% 96,34% 
Fine Tuning DenseNet 201 97,09% 86,05 % 97,51% 
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Table 3. The performance evaluation results with the best model in ensemble architecture 


Model Avg Avg Avg Avg Avg Fl 
Accuracy Sensitivity Specificity Precision Score 
Fine Tuning Inception V3 96.03% 76.57% 81.08% 96.46% 78.28% 
Fine Tuning Inception ResNet V2 96.03% 81.40% 96,32% 96.56% 83.57% 
Fine Tuning DenseNet 201 97.09% 86.05% 97.51% 81.43% 83.19% 
Ensemble 97.23% 90.12% 97.73% 82.01% 85.01% 
Confusion Matrix 
akiec- 15 2 10 0 1 0 0 . 600 
bee - 0 47 6 0 2 0 2 
450 
bkl - 1 0 93 0 8 0 6 
F 
= df- 0 0 0 14 1 0 0 
4 - 300 
& 
nv- 0 0 15 0 652 0 ll 
vasc- 0 0 1 0 0 B 0 -150 
mel - 0 1 10 0 20 0 71 
' ' ' ' ' ' ' -0 
akiec bec bkl df nv vase mel 


Predicted labels 


Figure 4. The confusion matrix of the ensemble model (Inception V3, Inception ResNet V2, and DenseNet 
201 Combined) 


Table 4. Comparison of performance from previous research in multi-class classification of skin lesion 


Author Dataset Avg accuracy Avg sensitivity Avg specificity 
(Harangi, 2018) [24] ISBI 2017 86.6% 55.6% 78.5% 
(Shahin et al, 2019) [25] The HAM 10000 89.9% 79.6% 86.2% 
Proposed approach The HAM 10000 97.23% 90.12% 97.73% 


4. CONCLUSION 

Due to high inter-class similarities and intra-class differences between lesions in terms of color, size, 
location, and appearance, it is a very challenging work to diagnose skin lesions. In this study, we propose an 
ensemble learning approach that can classify seven diagnostic categories of skin lesions. We compiled an 
ensemble model by combining three deep CNN architectures such as Inception V3, Inception ResNet V2 and 
DenseNet 201. Our model successfully classifies 7 classes of skin lesions with average accuracy, average 
precision, average sensitivity, average specificity, and F1 scores averaged 97.23%, 90.12%, 97.73%, 82.01%, 
and 85.01% respectively. Based on the proposed ensemble model, we can achieve the classification 
performance is much higher than the results of previous studies. The experimental results indicate that the 
proposed frameworks exhibit promising results. 
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